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(57) Abstract 

In a new motion estimation system (550) and process for video compression (600), the search for matching blocks is performed in 
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matches can be recognized with this approach. A multi-tiered approach may be employed to combine the frequency domain analysis with 
existing spatial domain techniques, the approach is computationally less expensive than typical spatial domain block matching algorithms. 
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METHODS AND APPARATUS FOR IMPROVED MOTION 
ESTIMATION FOR VIDEO ENCODING 

Related Applications 

5 The present invention claims the benefit of U.S. Provisional Application Serial No. 

60/106,867 entitled "Methods and Apparatus for Improved Motion Estimation for Video 
Encoding* 1 and filed November 3, 1998. 
Field of thclpvention 

The present invention relates generally to improvements in video encoding, for 

! 0 example, the encoding employed in such encoding standards as MPEG- 1 , MPEG-2, H.26 1 , 
H.263, and motion estimation. More particularly, it relates to advantageous techniques for 
applying frequency domain analysis to motion estimation. 
Background of the Invention 

The moving pictures expert group (MPEG) video compression standards, MPEG-1 

15 (ISO 1 1 1 72-2) and MPEG-2 (ISO 13818-2), employ image processing techniques at multiple 
levels. Of interest to the present invention is the processing of 1 6x 1 6 macroblocks and 8x8 
blocks. In the terminology used by the MPEG standards, a "frame" is an X by Y image of 
pixels, or picture elements. Each pixel represents the smallest discrete unit in an image. The 
"pixel", in MPEG usage, consists of three color components, one luminance and two 

20 chrominance values, Y, Cb, and Cr, respectively. Each frame is subdivided into 16x16 

"macroblocks" of pixels. A grouping of macroblocks is called a "slice". Each macroblock is 
further sub-divided into 8x8 "blocks" of pixels. A macroblock is typically comprised of four 
luminance (Y) and two or more chrominance (Cb and C r ) blocks. A more detailed description 
of luminance and chrominance is included in the MPEG-1 and MPEG-2 specifications. A 

25 sequence of frames ultimately makes up a video sequence. 

One of the key compression methods used in MPEG is the discrete cosine transform 
(DCT) or the two dimensional discrete cosine transform (2D-DCT) coupled with 
quantization. During the encoding process, each block is transformed from its spatial-domain 
representation or its actual pixel values to a frequency-domain representation utilizing a two- 

30 dimensional 8x8 DCT. The quantization has the effect of deemphasising or eliminating 
visual components of the block with high spatial frequencies not normally visible to the 
human visual system, thus reducing the volume of data needed to represent the block. The 
quantization values used by the MPEG protocols are in the form of a quantization scale 
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factor, included in the encoded bitstream, and the quantization tables. There are default 
tables included in the MPEG specification. However, these can be replaced by quantization 
tables included in the encoded bitstream. The decision as to which scale factors and tables to 
use is made by the MPEG encoder. 
5 One of the fundamental methods used by the MPEG protocol is a mechanism 

whereby a macroblock within a single frame within a sequence of frames is represented in a 
motion vector (MV) encoded format. An MV represents the spatial location difference 
between that macroblock and a reference macroblock from a different, but temporally 
proximate, frame. Note that whereas DCT compression is performed on a block basis, the 

1 0 MVs are determined for macroblocks. 

MPEG classifies frames as being of three types: I-frame (Intra-coded), P-frame 
(Predictive-coded), and B-frame (Bidirectionally predictive-coded). I-frames are encoded in 
their entirety. All of the information to completely decode an I-frame is contained within its 
encoding. I-frames can be used as the first frame in a video sequence, as the first frame of a 

1 5 new scene in a video sequence, as reference frames described further below, as refresh frames 
to prevent excessive error build-up, or as error-recovery frames, for example, after incoming 
bitstream corruption. They can also be convenient for special features such as fast forward 
and fast reverse. 

P-frames depend on one previous frame. This previous frame is called a reference 
20 frame, and may be the previous I-frame, or P-frame, as shown below. An MV associated 
with each macroblock in the P-frame points to a similar macroblock in the reference frame. 
During reconstruction, or decoding, the referenced macroblock is used as the starting point 
for the macroblock being decoded. Then, a, preferably small, difference macroblock may be 
applied to the referenced macroblock. To understand how this reference-difference 
25 macroblock combination works, consider the encoding process of a P-frame macroblock. 

Given a macroblock in the P-frame, a search is performed in the previous reference frame for 
a similar macroblock. Once a good match is found, the reference macroblock pixel values 
are subtracted from the current macroblock pixel values. This subtraction results in a 
difference macroblock. Also, the position of the reference macroblock relative to the current 
30 macroblock is recorded as an MV. The MV is encoded and included in the encoder's output. 
This processing is followed by the DCT computation and quantization of the blocks 
comprising the difference macroblock. To decode the P-frame macroblock, the macroblock 
in the reference frame indicated by the MV is retrieved. Then, the difference macroblock is 
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decoded and added to the reference macroblock. The result is the original macroblock 
values, or values very close thereto. Note that the MPEG encoding and decoding processes 
are categorized as lossy compression and decompression, respectively. 

The idea is that the encoding of the MV and the difference information for a given 

5 macroblock will result in a smaller number of bits in the resulting bitstream than the complete 
encoding of the macroblock by itself. Note that the reference frame for a P-frame is usually 
not the immediately preceding frame. A sample ordering is given below. 

B-frames depend on two reference frames, one in each temporal direction. Each MV 
points to a similar macroblock in each of the two reference frames. In the case of B-frames, 

10 the two referenced-macroblocks are averaged together before any difference information is 
added in the decoding process. Per the MPEG standard, B-frame is not used as a reference 
frame. The use of B-frames normally results in a more compact representation of each 
macroblock. 

A typical ordering of frame types would be If, B 2 , B 3 , P 4 , B 5 , B 6 , P7, Bg, B 9 , 110, and so 

15 on. Note that the subscripts refer to the temporal ordering of the frames. This temporal : 
ordering is also the display ordering produced by the MPEG decoder. The encoded ordering 
of these frames, found in an MPEG bitstream, is typically different: l u P 4 , B 2 , B 3 , P7, B 5 » B 6 , 
I,o, B 8 , B 9 , and so forth. The first frame is always an I-frame. As mentioned above, an I- 
frame has no temporal dependencies upon other frames, therefore an I-frame contains no 

20 MVs. Upon completion of the decoding of this frame, it is ready for display. The second 
frame to be decoded is P4. It consists of MVs referencing 1| and differences to be applied to 
the referenced macroblocks. After completion of the decoding of this frame, it is not 
displayed, but first held in reserve as a reference frame for decoding B 2 and B 3 , then 
displayed, and then used as a reference frame for decoding B5 and B6. The third frame to be 

25 decoded is B 2 . It consists of pairs of MVs for each macroblock that reference I| and P4 as 
well as any difference information. Upon completion of the decoding of B 2 , it is ready for 
display. The decoding then proceeds to B 3 . B 3 is decoded in the same manner as B 2 . B 3 's 
MVs reference I, and P 4 . B 3 is then displayed, followed by the display of P 4 . P4 then 
becomes the backward-reference frame for the next set of frames. Decoding continues in this 

30 fashion until the entire set of frames, or video sequence, has been decoded and displayed. 

A video sequence generally approximates the appearance of smooth motion. In such 
a sequence, a given block of pixels in a given frame will be similar in content to one or more 
spatially proximate blocks in a range of temporally proximate frames. Given smooth real 
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motion within a scene represented by such a sequence, and smooth apparent motion caused 
by changes in the orientation, point of view, and characteristics such as field width, for 
example, of the recorder of such a sequence, the positions of blocks that exhibit the greatest 
similarity across a number of temporally adjacent frames is very likely to be approximately 
5 spatially linear with respect to a fixed reference such as the common origin of the frames. 
The process of identifying the positions of such similar blocks across a range of frames is 
referred to as motion estimation. The spatial relationship among such blocks is referred to as 
the motion vector. 

Historically, the measure of similarity between blocks has been represented by the 
10 pixel-wise sum or mean of the absolute differences (SAD or MAD, respectively) between the 
given macroblock and the reference macroblock or macroblocks. The SAD is defined as the 
sum of the absolute value of the differences between the spatially collocated pixels in the 
given macroblock and the reference macroblock. The MAD can be determined by computing 
the SAD, then dividing by the number of pixels in the given macro block, for example, 256 in 
15 a 1 6x 1 6 macroblock. To differentiate between current techniques and the techniques of the 
present invention, the prior art spatial domain mean of absolute differences will be referred to 
as SD-MAD and the prior art spatial domain sum of absolute differences will be referred to as 
SD-SAD. 

Much of the computational effort expended by the typical MPEG encoder is used in 
20 locating macroblocks of pixels, within a window of macroblocks, in the reference frame or 
frames that yield the least SD-S AD or the least SD-MAD for a given macroblock. Large 
search window sizes are needed to compress fast motion such as might be found in a video 
sequence of a sporting event. 

The MPEG protocol represents an image in the frequency domain using DCT 
25 processing with quantization for compression reasons, yet motion estimation is typically 
performed in the spatial-domain. For example, implementations of block matching 
algorithms are readily found in the literature. These algorithms typically use an SD-MAD or 
an SD-SAD computation. In the following discussion of both existing algorithms and the 
new invention, the MAD statistic is used, but can readily be substituted by the SAD. The 
30 relationship between the two is one of a single constant In other words, this constant is the 
number of values being considered, such as 256 for an MPEG macroblock. Spatial-domain 
similarity analysis has as a basic assumption that the SD-MAD of two pixel macroblocks 
correlates with the volume of data required to represent the 2D-DCT of the difference 
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between the blocks. While this assumption may be valid, it is not the only possible 
correlation. Consider, as an extreme case, two frames with the first frame being completely 
white (i.e, the luminance of all values is equal to 255), and the second frame being 
completely black with all values equal to zero. Assume that the white frame is being used as 
a reference for the black P-frame, it is necessary to try to match the black blocks of the new 
frame against the white blocks of the I-frame. In the spatial domain, the SD-MADof any 
pair of black and white blocks is 255, the worst possible value, making them, prima facie, 
poor candidates as a reference-difference pair. A typical spatial-domain motion estimator 

would not consider them. 

In fact, the quantized 2D-DCT of the difference between these blocks is: 
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which contains exactly one non-zero quantity. Due to the characteristics of MPEG variable- 
length coding, the DCT of the difference can be expressed very compactly, actually making 
these blocks a good reference-difference pair, even though the blocks have an extremely poor 
SD-MAD. 

As a more complex example, Figs. 1 A and 1 B show a pair of pixel blocks: a reference 
^I6"ckntranda-sample-block-l2;-TheseJDl 
values: 
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Reference block 10 
A spatial-domain difference for these blocks 10 and 12 



Sample block 12 



252 216 180 144 108 72 36 0 

216 180 144 108 72 36 0 -36 

180 144 108 72 36 0 -36 -72 

144 108 72 36 0 -36 -72 -108 

108 72 36 0 -36 -72 -108 -144 

72 36 0 -36 -72 -108 -144 -180 

36 0 -36 -72 -108 -144 -180 -216 

0 -36 -72 -108 -144 -180 -216 -252 



quantifies the obvious, that there is little spatial-domain similarity between them. The SD-MAD 
is 94. The zigzag ordered, quantized 2D-DCT of the difference, however, 
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is a great deal more promising for compression. 

As the foregoing analysis demonstrates, spatial-domain similarity, such as SD-MAD, is 
not always the best criterion from which to determine good reference-difference block pairs for 
motion estimation. While pairs that exhibit great spatial-domain similarity can very likely yield 
minimal difference blocks under variable length coding, such analysis can miss pairs that exhibit 
far better compression. The present invention recognizes that the spatial-domain measurement 
of the prior art is not necessarily ideal, and it provides an advantageous alternative criterion and a 
method of implementation that typically achieves better results than an SD-MAD or an SD-SAD 
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approach. As further addressed below, the approach of the present invention is also significantly 
less computation intensive. 
Summary of the Invention 

One aspect of a motion estimation and compensation process and apparatus in accordance 

5 with the present invention is the minimization of the volume of data in the frequency domain, as 
contrasted with the spatial domain, needed to describe the difference between two blocks. 
Additionally, in accordance with the present invention, inspection can be performed at the level 
of the quantized 2D-DCTsofthe blocks and not at the level of the pixel blocks. Moreover, a 
smaller number of values need be inspected in the frequency domain, whereas all of the spatial 

10 domain values must be included in the typical spatial domain analysis. Blocks that will be 

missed by spatial-domain analysis will be identified. Better compression and faster computation 
thereby may be achieved. 

A more complete understanding of the present invention, as well as other aspects, 
features and advantages of the invention will be apparent from the following Detailed 

1 5 Description and the accompanying drawings. 
Brief Description of the Drawings 

Figs. 1 A and IB illustrate exemplary reference and sample blocks having little spatial- 
domain similarity; 

Figs. 2A and 2B illustrate exemplary reference and sample blocks that are very similar to 
20 one another except for a positional shift; 

Figs. 3 A and 3B show a pair of pixel blocks, each of which is of a single constant 
luminance superimposed with moderate normally-distributed high-frequency noise; 

Figs. 4A and 4B illustrate suitable processing systems for carrying out the present 
invention; 

25 Fig. 5 A illustrates an MPEG encoder including a spatial domain motion estimation 

function; 

Fig. 5B illustrates an MPEG encoder including a frequency domain motion estimation 
function in accordance with the present invention; and 



WO 00/27!23 



8 



PCT/US99/25707 



Fig. 6 illustrates an improved motion estimation process in accordance with the present 
invention. 

Detailed Description 

Referring again to Figs. 1 A and IB and the associated tables of values, the problem is to 
determine how so spatially dissimilar blocks can be identified as similar in the frequency 
domain. When the blocks are examined in frequency domain, this similarity is apparent The 
following tables show the quantized 2D-DCT for each of the blocks 1 0 and 12: 
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Reference block 10 Sample block 12 



For increased compression in conjunction with the run length encoding used in the 
MPEG protocol, a zigzag scan ordering is used. One of the two zigzag scan orderings is shown 
in the following table: 
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The similarity is even more apparent when the DCTs are represented in the zigzag scan order: 



128 -20 -20 0 0 
0 -2 0 0 0 

ooooo 
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0 0 0 0 2 0 0 0 0.0 0 
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oooooooo oooooooo 

S o 0 0 0 o o o oooooooo 

o . o 0 0 0 o o o oooooooo 

0 0 0 0 0 0 o o oooooooo 

oooooooo ooooooQ 0 

■ Reference block 10 Sample block 12 



Not only is the frequency domain mean of absolute differences (FD-MAD) of the two 
block DCTs very low («1), if the FD-MAD is computed using the zigzag order, it only requires 
the consideration of the first ten coefficients. The original ordering would require the 
comparison of the first 25 coefficients, in row-major order, to achieve the same results. The 
5 remaining 54 coefficients are all zeroes. The present invention recognizes that only a reduced 
number of coefficients may be advantageously considered in the FD-MAD. Moreover, the 
number of coefficients may be varied between implementations. Furthermore, the number 
coefficients under consideration may be varied during the encoding process as an adaptive 
method. Also, the frequency domain sum of absolute differences (FD-SAD) can be substituted 
10 for the FD-MAD without loss of function. 

Also, according to the MPEG standards, the DC-coefficient, the coefficient in the 0,0 
position of the DCT results, must always be encoded by itself. Consequently, the DC-coefficient 
can be excluded in the computation of the FD-MAD. For instance, the first example in this text 
compared an all-black block to an all-white block. The FD-MAD of the difference of these two 
15 blocks, using all 64 values, is 4. Removing the DC-coefficient from the FD-MAD computation 
would yield an FD-MAD of zero. This would indicate a very good frequency domain match. 
This method would also use a variable or adaptive number of coefficients for the FD-MAD or 

FD-SAD computation. 

An alternative method for measuring possible matches, possibly leading to better 
20 compression, would be to count the number of non-zero entries while computing the FD-MAD. 
This count would indicate the number of symbols representing non-zero AC coefficients that 
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would need to be encoded and appended to the video-bitstream. In the all-black versus all-white 
example mentioned previously, the number of frequency or AC-coefficients that would need to 
be encoded is zero. Thus, only the DC-coefficient and the end-of-block (EOB) symbols would 
need to be encoded. 

The foregoing example illustrates an important aspect of frequency-domain analysis in 
accordance with the present invention. Inspection is not of the pixel blocks themselves, but of 
the quantized 2D-DCTs of the blocks. The technique offers several advantages over spatial- 
domain analysis. First, it identifies blocks that spatial-domain analysis will miss. Second, 
because of the first advantage, it has the potential of finding blocks that offer better compression. 
Third, once the block DCTs have been computed, identification of similar blocks is faster due to 
two reasons: a) the smaller number of values that must be compared, and b) in the case of 1- 
frame reference blocks, the DCTs for the reference blocks have been previously computed. 
Further analysis is provided below. 

A frequent occurrence in the search for matching blocks is that one block can be very 
similar to another block except for a positional shift. Such a pair is shown in Figs. 2A and 2B, 
which depict a pair of blocks 20 and 22 representing portions of an image of a point-illuminated 
sphere. In the reference block 20, the center of the projection of the sphere is in the upper left 
corner of the block. In the sample block 22, the center has been moved two pixels down and to 
the right. 

These blocks 20 and 22 are numerically represented as: 
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Reference block 20 Sample block 22 



with a spatial-domain difference of: 
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and an SD-MAD of 66. The quantized DCTs of these blocks are: 
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Reference block 20 Sample block 22 



which have an FD-MAD of 1 . The zigzag quantized DCT of the spatial -domain difference is: 
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Once again, frequency domain analysis has a demonstrated ability to identify a reference- 
5 difference pair that spatial domain analysis might have missed. Moreover, a comparison of only 
the first 1 6 coefficients would have produced a similar result. 

As a further example, Figs. 3 A and 3B show a pair of pixel blocks 30 and 32, each of 
which is of a single constant luminance superposed with moderate normally-distributed high- 
frequency noise. The reference block 30 has a nominal intensity of 52 (averaging 57 due to 
10 small-sample effects) while the sample block 32 has an intensity of 220 (actually averaging 221). 
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While these blocks bear very little visual resemblance to each other, the spatial domain MAD is 
1 64, their respective quantized DCTs are: 
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Reference block 30 Sample block 32 
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and their frequency domain zigzag DCT difference is 
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withanFD-MADof3. 

| 5 One embodiment of frequency-domain analysis in accordance with the present invention 

has been implemented as a modification to an MPEG-2 encoder provided by the MPEG Software 
Simulation Group (MPEG-L@netcom.com), This modified encoder and the unmodified original 
! have been used to generate various test results comparing frequency-domain and spatial-domain 

analysis. As a test vehicle, sixteen 5 1 2x5 1 2 frames were encoded into two MPEG-2 streams 
10 using each of the two encoders. The following system configurations were used in this test. 

Fig. 4 A illustrates a processing system 100, suitable for implementing the present 
invention. System 1 00 includes a system processor 1 1 0, system RAM 1 20, program memory 
130, an input storage mechanism 140 which operates as a source of frames to be processed, a 

i • 
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frequency domain MPEG 2 encoder 150, and an MPEG encoded output mechanism 160. Output 
mechanism 160 may suitably comprise a network adapter or modem 163 and an output storage 
mechanism 1 66 as shown in Fig. 4A, or may be suitably adapted as desired for a given 
application. The source of frames 140 provides frames to the frequency domain encoder 150 

5 which provides its output on a system bus to either the network adapter or modem 163 or storage 
1 66. Using a system like system 100, but with a standard spatial domain encoder in place of the 
frequency domain MPEG 2 encoder 150, SD-MAD encoding of an exemplary video sequence 
required 1 19.31 CPU seconds to encode. By comparison, FD-MAD encoding of the same 
exemplary sequence in accordance with the present invention required only 40.85 CPU seconds 

I o so that for this exemplary sequence, the approach of the present invention was 2.92 times faster. 
Additionally, the encoded SD-MAD MPEG-2 sequence for this example is 326,982 bytes. The 
encoded FD-MAD sequence is 322,321 bytes for a reduction of 4661 bytes, or, as a percentage, 
the FD-MAD sequence is 1.4%, smaller. This smaller bitstream is the result of better matches in 
the frequency domain as opposed to the spatial domain. 

15 The present invention may also be embodied in a ManArray architecture such as 

processing system 200 shown in Fig. 4B. The processing system 200 includes a ManArray 
processor 210, program memory 220, local memory 230. an input storage mechanism 240 which 
operates as a source of frames to be processed, and an MPEG encoded output 250. Output 250 
may suitably comprise a network adapter or modem 253 and an output storage mechanism 256 

20 as shown in Fig. 4B, or may be suitably adapted as desired for a given application. In the 

processing system 200, the ManArray Processor 210 runs an MPEG-2 encoder program stored in 
its program memory 220 and uses its local memory 230 to process data using methods described 
in the present invention. The results are stored as MPEG encoded output data in storage 256 or 
may be transmitted to a remote location via network adapter, modem, or other transmission 

25 mechanism 253. 

Further details of a presently preferred ManArray architecture for use in conjunction with 
the present invention are found in U.S. Patent Application Serial No. 08/885,310 filed June 30, 

1997, U.S. Patent Application Serial No. 08/949,122 filed October 10, 1997, U.S. Patent 
Application Serial No. 09/169,255 filed October 9, 1 998, U.S. Patent Application Serial No. 

30 09/1 69,256 filed October 9, 1 998, U.S. Patent Application Serial No. 09/1 69,072 filed October 9, 

1998, U.S. Patent Application Serial No. 09/187,539 filed November 6, 1998, U.S. Patent 
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Application Serial No. 09/205,558 filed December 4, 1 998, U.S. Patent Application Serial No. 
09/2 1 5,08 1 filed December 1 8, 1 998, U.S. Patent Application Serial No. 09/228,374 filed 
January 12, 1999, U.S. Patent Application Serial No. 09/238,446 filed January 28, 1999, U.S. 
Patent Application Serial No. 09/267,570 filed March 12, 1999, U.S. Patent Application Serial 
5 No. 09/337,839 filed June 22, 1999, U.S. Application Serial No. 09/350,191 filed July 9, 1999, 
U.S. Patent Application Serial No. 09/422,015 filed October 21, 1999 entitled "Methods and 
Apparatus for Abbreviated Instruction and Configurable Processor Architecture", as well as, 
Provisional Application Serial No. 60/1 13,637 entitled "Methods and Apparatus for Providing 
Direct Memory Access (DMA) Engine" filed December 23, 1998, Provisional Application Serial 

10 No. 60/1 13,555 entitled "Methods and Apparatus Providing Transfer Control" filed December 
23, 1998, Provisional Application Serial No. 60/139,946 entitled "Methods and Apparatus for 
Data Dependent Address Operations and Efficient Variable Length Code Decoding in a VLIW 
Processor" filed June 18, 1999, Provisional Application Serial No. 60/140,245 entitled "Methods 
and Apparatus for Generalized Event Detection and Action Specification in a Processor" filed 

15 June 21, 1999, Provisional Application Serial No. 60/140,163 entitled "Methods and Apparatus 
for Improved Efficiency in Pipeline Simulation and Emulation" filed June 21, 1999, Provisional 
Application Serial No. 60/140,162 entitled "Methods and Apparatus for Initiating and Re- 
Synchronizing Multi-Cycle SIMD Instructions" filed June 21, 1999, Provisional Application 
Serial No. 60/140,244 entitled "Methods and Apparatus for Providing One-By-One Manifold 

20 Array (lxl ManArray) Program Context Control" filed June 21, 1999, Provisional Application 
Serial No. 60/140,325 entitled "Methods and Apparatus for Establishing Port Priority Function in 
a VLIW Processor" filed June 21, 1999, and Provisional Application Serial No. 60/140,425 
entitled "Methods and Apparatus for Parallel Processing Utilizing a Manifold Array (ManArray) 
Architecture and Instruction Syntax" filed June 22, 1 999 respectively, all of which are assigned 

25 to the assignee of the present invention and incorporated by reference herein in their entirety. 
Fig. 5 A illustrates data flow for an MPEG encoder using a spatial domain motion 
estimation process. The data flow starts with the incoming video at input 501 . The first frame to 
be encoded is an I-frame and passes directly through selector 514, without any input from 
spatial-domain motion estimation (SD-ME) processor 512, to the DCT processor 502, The DCT 

30 pk>cessor 502 converts the spatial data to its frequency domain counterpart. The data is then 
quantized in quantizer 504 to reduce the amount of information as part of the compression 
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process. The new quantized values are directed to a variable length code encoding processor 506 
where a combination of zigzag scan reordering, run length encoding, and variable length 
encoding processes are performed to create data as part of an output bitstream 503. Also, since 
this I-frame is to be used as a reference frame, the quantized frequency domain data is converted 
5 back into the spatial domain. The data is first inverse-quantized in inverse quantizer 508, then 
converted with an inverse discrete cosine transform (IDCT) by an IDCT processor 510. The data 
is then sent to the spatial-domain motion estimation processor 5 1 2 for storage and future 
processing. 

After the completion of the first frame, the second frame to be encoded is typically a P- 

10 frame. Since the P-frame consists of MVs and difference blocks, a different flow occurs. The 
current macroblock being encoded enters the flow as part of the incoming video at input 501 and 
is directed to the spatial-domain motion estimation processor 5 1 2. The SD-ME processor 
typically uses a block-matching algorithm, with an SD-SAD or an SD-MAD computation, to 
find the motion vectors (MVs). The MVs are sent to the VLC encoding processor 506, and used 

15 in the computation of the difference block by the SD-ME processor 512. The difference block is 
then sent to selector 5 1 4 as ah input to the compression and encoding steps carried out by DCT 
processor 502, quantizer 504, and VLC encoding processor 506. The resulting data is appended 
to the output bitstream 503. Also, since a P-frame can be used as a reference frame, the 
quantized data is sent through inverse quantizer 508 and IDCT processor 5 1 0 to SD-ME 

20 processor 512 for storage and future processing. B-frames follow a similar process, as described 
in the introductory section, using two reference frames that have been stored as part of the SD- 
ME process performed by SD-ME processor 512. 

Fig. 5B illustrates data flow for an MPEG encoder using the frequency domain motion 
estimation process of the present invention. An I-frame is encoded in a similar manner as 

25 discussed above in connection with Fig. 5 A. The incoming video on video input 551 is directed 
through selector 566 and is converted from spatial-domain data to frequency domain data by the 
DCT processor 552. The data is then quantized in quantizer 554. The data then is processed 
through the zigzag scan ordering, run length encoding, and variable length encoding processes in 
variable length encoding processor 556 and appended to an output bitstream 553. A copy of the 

30 quantized output is also sent to frequency-domain motion estimation processor 562 for storage 
and future processing. A copy of the quantized output is also sent to the inverse quantization 
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processor 558, then converted back to spatial-domain data by the inverse discrete cosine 
transform (IDCT) processor 560. This data is then sent to the motion compensation processor 
564 for storage and future processing. 

After completion of the first frame, the second frame to be encoded is typically a P- 
frame. Since the P-frame consists of MVs and difference blocks, a different data flow occurs. 
The current macroblock being encoded enters the flow as part of the incoming video on video 
input 55 1 and is directed to the DCT processor 552, via selector 566, and on to the quantization 
processor 554. The resulting quantized frequency domain data is sent directiy to FD-ME 
processor 562. An FD-MAD search method is used in FD-ME processor 562 to find the best 
match. The output of the FD-ME processor 562 is an MV. A copy of the MV is sent to the VLC 
processor 556 for encoding and inclusion in the output bitstream 553. A copy of the MV is also 
sent to the motion compensation processor 564. Here, a difference macroblock is computed 
using the reconstructed data previously received from IDCT processor 560 subtracted from the 
original macroblock pixel data received from the input video on input 551 . The resulting 
difference block is passed along for conversion to frequency data and quantization through 
blocks 566, 552, and 554. Since this is a P-macroblock, the output of quantizer 554 is sent to 
FD-ME processor 562 and IQ processor 558 as previously described for the I-frame output 
Also, the output of quantizer 554 is sent to VLC processor 556 for the zigzag scan ordering, run 
length encoding, and variable length encoding processes in VLC processor 556 and appended to 
the output bitstream 553. 

B-frames follow a similar process, as described in the introductory section, using two 
reference frames that have been stored as part of the FD-ME processor 562 and MC processor 
564. While various blocks of both Figs. 5A and 5B have been described as separate processors, 
this was done to facilitate and simplify the discussion of data flow and processing. It will be 
recognized that the recited "processes" might be implemented in a single processor, multiple 
processors, as functional blocks in an application specific integrated circuit (ASIC) or some 
combination of the above. For example, various processes may be implemented in a specialized 
ASIC and other processes implemented in a generalized microprocessor subject to program 
control and interfaced with the specialized ASIC. With the ManArray architecture, multiple 
processors may be assigned to different process tasks. 
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Fig. 6 illustrates various aspects of a motion estimation process 600 in accordance with 
the present invention. In step 60 1 , a first block to be matched from a current frame being 
processed is extracted. In step 603, a 2D DCT computation is performed for the block. In step 
605, a second block in a search pattern is extracted from the reference frame. In step 607, the 
5 FD-MAD for the first and second blocks is computed based on a subset of the quantized DCT 
values. In step 609, it is determined whether the FD-MAD computed in 607 is less than the 
lowest FD-MAD previously computed. At the start of the motion estimation process for this 
block, the value of the lowest FD-MAD is initialized to some large value. If the answer is "yes", 
the FD-MAD just computed in step 607 replaces the previously computed lowest FD-MAD as 

1 o the lowest FD-MAD and is stored in a memory in step 611. In step 6 1 3, it is determined whether 
there are additional blocks to be searched in the search pattern. It is noted that at step 609* if the 
answer was "no", then the process skips step 61 1 and proceeds directly to step 613. 

Assuming "yes" there are further blocks to be searched at step 613, the process 600'loops 
back up to step 605 and extracts the next block in the search pattern from the reference frame. 

15 Steps 605-613 continue to repeat until "no" there are no function blocks to be compared with the 
block extracted in step 601 . Then, the current lowest FD-MAD value is compared with a 
threshold value in step 615, If the current lowest FD-MAD value is less than the threshold^ in 
step 617, the two blocks corresponding to the lowest FD-MAD value are accepted as a reference- 
difference pair. In step 619, the process 600 determines if there are more blocks to be matched 

20 from the current frame. If the answer is "yes", the process 600 loops back up to step 601 and the 
process steps are repeated. If the answer is "no", then in step 621, the process 600 returns 
success/failure and the MVs found to the MPEG encoding processes at VLC processor 556 and 
MC processor 564 of Fig. 5B, for example. Alternatively, the process 600 may proceed to the 
next frame of video to be analyzed and repeat itself. Returning to step 615, it is noted that if the 

25 lowest FD-MAD is not less than the threshold, the process 600 skips step 617 and proceeds 
directly to step 619. 

Additionally, a hybrid system can be constructed using this frequency domain motion 
estimation process. A typical spatial domain block matching process or system can use a 
hierarchical approach. One example is a three-tiered approach. First, a good match using low 
30 SD-MAD or SD-SAD, for example, can be found on a block-aligned basis. That is, the 

coordinates of the blocks being considered for a match to the current block being encoded are 
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integral multiples of 8 pixels in each direction. Then, a second tier search would look for block 
matches in the search window surrounding the current prospective block on a pixel-aligned basis. 
Finally, the third tier search would be on a half-pixel basis in the area immediately adjacent to 
the result of the second tier search. 

The FD-MAD approach in the present invention could be advantageously incorporated 
into a multiple-tier approach. For example, the first tier may use the FD-MAD approach 
described in the present invention. Then, the second and third tiers may use a typical spatial 
domain block-matching algorithm. Other permutations are possible. For example, using the FD- 
MAD for the first phase, then using another algorithm to refine the results for a better match. 

The size of the subset of quantized DCT values used for the FD-MAD or FD-SAD can be 
modified either statically or dynamically. For example, a programmer may select a set number 
of values, say 16, to be used for the computation of the FD-MAD. It is expected that the first 16 
values would be used since they statistically represent the more populated values in the quantized 
8x8 DCT set of values. However, a programmer may choose to modify the size of the subset 
based on the particular requirements of the system being designed. Alternatively, the 
programmer may make the size of the subset dynamic. That is, the encoder may include an 
adaptive mechanism whereby the size value may be adjusted during the program based on a 
variety of factors including, but not limited to, compression requirements, results of past 
searches, and the like. 

As mentioned previously, a sample image was encoded using the process of the present 
invention. This encoding resulted in a 1.4% reduction in the size of the output bitstream as 
compared to a conventional spatial-domain encoder. This process used a single-tiered approach 
with 1 6 quantized DCT values for each 8x8 block being used in the FD-MAD. 

The techniques of the present invention offer very fast similar-block identification. They 
make extensive use of data already computed by the MPEG encoding process. They also 
provide block-similarity criteria in the frequency domain which offers more good potential block 
matches than spatial domain analysis, which can further accelerate the speed of the process. 
Additionally, block matches, identified in the frequency domain, that are more similar in the 
frequency domain than block matches identified in the spatial domain, offer the potential for 
better VLC compression without quantization scaling loss. 
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While the present invention has been disclosed in a presently preferred context, it will be 
recognized that this invention may be variously adapted to a variety of environments and 
applications consistent with the disclosure herein and the claims. 
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We claim: 

L A system for video encoding employing frequency domain based motion 
estimation comprising: 

an input for receiving frames of video input to be encoded; 
a processor; 
a memory; 

a frequency domain encoder; and 

an output port for outputting encoded video, the system operating to quantize two 
dimensional discrete cosine transform values for a reference block and a sample block and to 
encode based upon a frequency domain comparison of the reference block and the sample block. 

2. The system of claim 1 wherein the system is further operative to choose the 
quantized two dimensional discrete cosine transform values based on a zigzag scan ordering for 
the reference block and the sample block. 

3 . The system of claim 2 wherein a predetermined number of coefficients from the 
zigzag scan ordering are analyzed for purposes of making said frequency domain comparison of 
the reference block and the sample block. 

4. The system of claim 3 wherein the predetermined number of coefficients is 
adaptively varied. 

5. The system of claim 1 wherein the system is operative to compute a frequency 
domain mean of absolute differences between the reference block and the sample block, 

6. The system of claim 1 wherein the sample block is an I-frame block and the 
quantized two dimensional discrete cosine transform values for a plurality of reference blocks 
have been previously computed and stored in the memory. 

7. The system of claim 1 wherein the system is operative to perform at least a two tier 
analysis, the first tier being the frequency domain comparison and the second tier being a spatial 
domain analysis. 

8. A system for video encoding employing frequency domain based motion 
estimation comprising: 

an input for receiving frames of video to be encoded; 
a ManArray processor; 
a memory; 
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a program memory for storing a program which directs the ManArray processor to 
compute a two dimensional discrete cosine transform (2D DCT) for a block of a frame of video 
to be encoded and to perform frequency domain motion estimation of said block with respect to a 
first reference block, and encoding frames of video input received by the input source; and 
5 an output port for outputting encoded frames of video. 

9. The system of claim 8 wherein the ManArray processor subject to the program 
stored in the program memory is further operative to compute a first frequency domain mean of 
absolute differences (FD-MAD) value for the block and the first reference block; to compute a 
second FD-MAD value for the block and a second reference block of a video frame; to determine 

10 if the second computed FD-MAD value is less than the first computed FD-MAD value; and to 
replace the first computed FD-MAD value with the second computed FD-MAD value upon 
determining that the second computed FD-MAD value is less than the first computed FD-MAD 
value. 

1 0. The system of claim 9 wherein the FD-MAD values are computed on a subset of 
15 the quantized 2D DCT block data. 

1 1 . The system of claim 9 wherein the FD-MAD values are computed after excluding a 
DC-coefficient, the coefficient in the 0,0 position of the 2D-DCT. 

12. The system of claim 8 wherein the ManArray processor subject to the program 
stored in the program memory is further operative to compute a first FD-S AD value for the block 

20 and the first reference block; to compute a second FD-S AD value for the block and a second 
reference block of a video frame; to determine if the second computed FD-SAD value is less 
than the first computed FD-SAD value; and to replace the first computed FD-SAD value with the 
second computed FD-SAD value upon determining that the computed FD-SAD value is less then 
the first computed FD-S AD value. 

25 13. The system of claim 12 wherein the FD-SAD values are computed on a subset of 

the quantized 2D DCT block data. 

14. The system of claim 12 wherein the FD-SAD values are computed after excluding 
a DC coefficient, the coefficient in the 0,0 position of the 2D-DCT. 

1 5. A method for video encoding employing frequency domain based motion 
30 estimation comprising the steps of: 

extracting a first block to be matched from a current frame; 
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computing a two dimensional discrete cosine transform for the first block; 
extracting a second block from a reference frame; 

computing a frequency domain mean of absolute differences (FD-MAD) for the first and 
second blocks; and 

5 determining if the computed FD-MAD is less than a previously determined and stored 

lowest FD-MAD value. 

1 6. The method of claim 1 5 further comprising the step of replacing the previously 
determined and stored lowest FD-MAD value with the computed FD-MAD value if the 
computed value is less than said lowest FD-MAD value. 
10 17. The method of claim 1 6 further comprising the step of determining if additional 

blocks in the reference frame remain to be searched. 

18. The method of claim 1 5 further comprising the steps of: 
choosing the two dimensional discrete cosine transform values based on a zigzag scan 
ordering for the first block and the second block. 
} 5 19. The method of claim 1 8 wherein a predetermined number of coefficients from the 

zigzag scan ordering are analyzed for purposes of computing the FD-MAD value. 

20. The method of claim 1 7 further comprising the steps of: 
extracting a third block from the reference frame; 
computing an FD-MAD for the first and third blocks; and 

20 determining if the computed FD-MAD for the first and third blocks is less than the stored 

lowest FD-MAD value. 

21 . The method of claim 20 wherein said FD-MAD is computed after excluding a DC- 
coefficient corresponding to the 0,0 position of the two dimensional discrete transform. 

22. The method of claim 1 7 further comprising the step of: 

25 determining if the lowest FD-MAD at the time it is determined that no further blocks in 

the reference frame remain to be searched is less than a threshold value. 

23. The method of claim 22 wherein the blocks corresponding to the lowest FD-MAD 
at the time is it determined that no further blocks in the reference frame remain to be searched 
are accepted as a reference-difference pair. 

30 24. The method of claim 23 wherein a spatial domain analysis is also performed on the 

blocks of the current frame with respect to the blocks of the reference frame. 
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25. The method of claim 24 wherein best reference difference pairs from both 
frequency domain and spatial domain analyses are compared to determine which pairs are 
overall the best. 

26. A method for video encoding employing frequency domain based motion ; 
5 estimation comprising the steps of: 

extracting a first block to be matched from a current frame; 

computing a two dimensional discrete cosine transform (2D DCT) for the first block; / 
extracting a second block from a reference frame; 

computing a'freqiiency domain sum of absolute differences (FD-SAD) value for the first 
10 and second blocks; and 

determining if the computed FD-SAD value is less than a previously determined and 
stored lowest FD-SAD value. 

27. The method of claim 26 further comprising the step of replacing the previously 
determined and stored lowest FD-SAD value with the computed FD-SAD value if the computed 

15 value is less than said lowest FD-SAD value. 

28. The method of claim 27 further comprising the step of determining if additional 
blocks in the reference frame remain to be searched. 

29. The method of claim 26 further comprising the steps of: 

choosing the two dimensional discrete cosine transform values based on a zigzag scan 
20 ordering for the first block and the second block. 

30. The method of claim 29 wherein a predetermined number of coefficients from the 
zigzag scan ordering are analyzed for purposes of computing the FD-SAD value. 

31. The method of claim 28 further comprising the steps of: 
extracting a third block from the reference frame; 

25 computing an FD-SAD value for the first and third blocks; and 

determining if the computed FD-SAD for the first and third blocks is less than the stored 
lowest FD-SAD value. 

32. The method of claim 28 further comprising the step of: 

determining if the lowest FD-SAD value at the time it is determined that no further 
30 blocks in the reference frame remain to be searched is less than a threshold value. 
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33. The method of claim 32 wherein the blocks corresponding to the lowest FD-SAD at 
the time is it determined that no further blocks in the reference frame remain to be searched are 
accepted as a reference-difference pair. 

34. The method of claim 27 wherein a spatial domain analysis is also performed on the 
5 blocks of the current frame with respect to the blocks of the reference frame. 

35. The method of claim 28 wherein best reference difference pairs from both 
frequency domain and spatial domain analyses are compared to determine which pairs are 
overall the best 

36. A methdd for video encoding employing frequency domain based motion 
10 estimation comprising: 

receiving incoming video to be encoded on a video input; 

converting a first block of a first frame of the incoming video from spatial domain data 
to frequency domain data utilizing a two dimensional discrete cosine transform; 

quantizing the frequency domain data; 
15 comparing the quantized frequency domain data with quantized frequency domain data 

for a second block; 

zigzag scan ordering, run length encoding, and variable length encoding the quantized 
frequency domain data based upon the comparison, to produce output data; and 

appending the output data to an output bitstream. 
20 37. The method of claim 36 further comprising sending a copy of the quantized 

frequency domain data for storage and future processing by a frequency-domain motion 
estimation processor. 

38. The method of claim 36 further comprising: 

sending a copy of the quantized frequency domain data for inverse quantization by an 
25 inverse quantizer and conversion back to spatial domain data by a two dimensional inverse 
discrete cosine transform (2D-IDCT). 

39. The method of claim 38 further comprising: 

storing the 2D-IDCT processed data for future processing by a motion compensation 
processor. 

30 40. The method of claim 37 further comprising after completion of the processing of 

the first frame: 
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converting a first block of a second frame from spatial domain data to frequency domain 
data utilizing ihe two dimensional discrete cosine transform; 

quantizing the frequency domain data for the second block; 

sending the quantized frequency domain data for the second block to the frequency 
domain motion estimation processor; and 

finding a best match. 

41. The method of claim 40 further comprising; 
producing a motion vector for the best match; and 

encoding thernotion vector for inclusion in the output bitstream. 

42. A method for video encoding employing frequency domain based motion 
estimation comprising the steps of: 

extracting a first block to be matched from a current frame; 

computing a quantized two dimensional discrete cosine transform for the first block; 
extracting a second block from a reference frame; 

computing a frequency domain mean of absolute differences (FD-MAD) for the first and 
second blocks; and 

determining if the computed FD-MAD is less than a previously determined and stored 
lowest FD-MAD value. 

43 . The method of claim 42 further comprising the steps of: 

choosing the quantized two dimensional discrete cosine transform values based on a 
zigzag scan ordering for the first block and the second block. 
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